Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System1

نویسندگان

  • Maria Teresa Pazienza
  • Armando Stellato
  • Michele Vindigni
چکیده

E-commerce and the continuous growth of the WWW has seen the rising of a new generation of e-retail sites. A number of commercial agent-based systems has been developed to help Internet shoppers decide what to buy and where to buy it from. In such systems, ontologies play a crucial role in supporting the exchange of business data, as they provide a formal vocabulary for the information and unify different views of a domain in a shared and safe cognitive approach. In CROSSMARC (a European research project supporting development of an agent-based multilingual/multi-domain system for information extraction (IE) from web pages), a knowledge based approach has been combined with machine learning techniques (in particular, wrapper induction based components) in order to design a robust system for extracting information from relevant web sites. In the ever-changing Web framework this hybrid approach supports adaptivity to new emerging concepts and a certain degree of independence from the specific web-sites considered in the training phase.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Populating Ontologies with Data from OCRed Lists

A flexible, accurate, and efficient method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selectio...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

DIADEM: Thousands of Websites to a Single Database

The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of websites has long proven elusive, despite its central role in the “web of data”. Through an extensive evaluation spanning over 10000 web sites ...

متن کامل

Populating Ontologies with Data from Lists in Family History Books

A flexible, accurate, and cost-effective method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its sel...

متن کامل

The Wrapper Induction Environment

There is much interest in systems that automatically interact with Internet information sites. Such systems are hard to build, partly because they use hand-crafted wrappers to extract a site’s content. We advocate wrapper induction, a technique for automatically learning wrappers. Our wrapper induction e_~nvironment (WIEN) enables users quickly capture a set of example page; our wrapper learnin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003